Adding correlated random effects to longitudinal data
Background
A simple random intercept per participant is not a great fit for repeated measures data where correlations between samples vary with time i.e. where samples are more likely to be correlated when they are close together. This can be accounted for with a time dependent correlation structure for the random effects. This document describes building and fitting such a model.
Method
Random intercept model
Consider \(n\) measurements of a binary outcome for \(N\) participants; each may have a different number of measurements. We can construct a logistic regression model where the measurements \(y_{i}, i = 1,2 ... n\) are given by:
\[ y_{i} \sim \text{Bernoulli}(p_{i}) \] and
\[ p_{i} = \text{logit}(\alpha + \sum_{j=1}^{j=n}{ x_{ij} \beta_{j}} + \sum_{k=1}^{k=N}{z_{ik}\gamma_{k}}) \]
Where \(\alpha\) is the model intercept; \(x_{ij}\) is the value of the \(j\)th covariate for observation \(i\); \(\gamma_{k}\) is the value of the random effect for patient \(k\) and \(z_{ik}\) is the width \(N\), height \(n\) design matrix that encodes which sample belongs to which patient, and the random intercept is fixed for each patient, no matter how close together or far apart in time the samples are, and drawn from a normal distribution:
\[ \gamma_{i} \sim \text{Normal}(0, \sigma) \]
Final models
So the final model becomes
\[ \boldsymbol{y} \sim \text{Bernoulli}(\boldsymbol{p}) \] Where the length \(n\) vector \(\boldsymbol{y}\) is the vector of observations and \[ \boldsymbol{p} = \text{logit}(\beta_{0} + \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{f}) \] Where \[ \boldsymbol{f} = \boldsymbol{L\eta} \] \[ \boldsymbol{\eta} \sim \text{Normal}(\boldsymbol{0}, \boldsymbol{I}) \] \[ \boldsymbol{L}^{T}\boldsymbol{L} = \boldsymbol{\Sigma} \]
Where the covariance matrix \(\boldsymbol{\Sigma}\) for two samples \(i\) and \(j\) seperated by time \(t_{ij}\) is defined by
\[ \Sigma_{ij} = \alpha^{2} \text{exp}(\frac{-t^{2}_{ij}}{2l^{2}}) + \delta_{ij}\sigma^2 \]
Where \(\delta_{ij}\) = 1 for \(i=j\) and \(0\) otherwise.
\(\boldsymbol{X}\) is the matrix of covariates for a given sample (where 1 is present, 0 absent) and the value of any entry of this matrix encoding antibiotic exposure is 1 whilst exposure is ongoing and is allowed to decay a time \(t\) following cessation of exposure as:
\[ \text{exp}(\frac{-t}{\tau}) \]
With priors \[\beta_{0}, \boldsymbol{\beta}, \sigma, \alpha, \tau \sim \text{studentT}(3\text{df}, 0, 3)\] \[l \sim \text{InvGamma}(2,2 )\]
Where the time of sample collection variable is mean centred and scaled and time since antibiotic exposure is scaled by the SD of the time variable.In practice, we can fit each participant separately to avoid making the gigantic covariance matrix.
Results
Parameter estimates from the models are shown in Figure 3. The conclusions are largely unchanged from the simpler random intercept models. The implied correlation of two samples within-participant is shown in figure Figure 4